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Introduction 


This  is  the  final  report  of  a  research  project  which  began  in  the  summer  of 
1980  and  continued  through  the  end  of  1983.  This  report  will  summarize  the 
various  objectives  of  the  project,  the  approaches  followed  to  accomplish  these 
objectives,  and  the  major  results  obtained.  Appended  to  this  report  is  a  list 

of  the  participating  personnel,  and  a  list  of  the  various  scholarly  activities 
associated  with  the  project  which  were  performed  by  these  personnel. 

Ob iectives 

The  ultimate  goal  of  the  research  program  is  to  enhance  the  quality  of 
computer  software.  In  order  to  accomplish  this  goal,  however,  there  have  to 
be  agreed  upon  notions  of  just  what  quality  means  and  how  it  can  be  assessed. 
This  project  sought  to  make  contributions  to  our  understanding  of  these 
issues . 

In  the  past  decade,  it  has  become  increasingly  popular  to  measure  features  of 
the  code  itself,  and  to  associate  the  resulting  metrics  with  quality.  One 
technique,  due  to  Halstead,  et  al.,  has  come  to  be  known  as  "software 
science".  The  metrics  of  software  science  consist  of  functions  of  the 
operators  and  operands  used  in  the  code.  Several  researchers  have  studied 
these  metrics,  and  many  have  found  apparent  relationships  between  the  metrics 
and  such  behavioral  characteristics .  as  effort  to  write  the  code.  One  of  the 
specific  objectives  of  this  project  was  to  study  the  software  science  metrics 
in  the  COBOL  arena,  as  the  earlier  work  had  not  really  been  concerned  with 
programs  written  in  this  language. 

Another  objective  of  this  research  concerned  the  evaluation  of  principles  of 
software  development.  In  many  software  engineering  articles  and  reports, 
authors  make  statements  concerning  the  DOs  and  DON'Ts  for  obtaining  good 
software.  These  statements,  however,  are  often  based  on  no  scientifically 
obtained  evidence.  Our  research  therefore  was  interested  in  seeing  the  extent 
to  which  some  o:  these  principles  could  be  validated  in  a  controlled 
laboratory  setting. 


The  implementation  of  experiments  to  test  hypotheses  in  software  engineering 
is  often  hampered  by  the  difficulty  in  measuring  aspects  of  quality  such  as 
understandability .  The  standard  means  for  assessing  understandability,  the 
comprehension  test,  is  very  time-consuming  to  create.  We  therefore  sought  to 
examine  alternative  instruments  which  are  easier  to  create  but  which  are  still 
reliable  and  valid  means  of  measuring  one's  understanding  of  a  piece  of 
software . 


In  trying  to  apply  the  area  of  software  science  to  COBOL,  we  first  of  all 
needed  to  create  a  software  tool  on  which  the  various  metrics  of  software 

science  could  be  conveniently  obtained.  The  next  step  was  to  analyze  COBOL 
programs  using  the  tool  to  see  if  the  relationships  from  software  science  for 
which  evidence  existed  in  other  programming  languages  could  be  shown  to  apply 
to  COBOL.  The  goal  here  was  to  use  a  variety  of  programs,  including  those 
written  by  students  in  computer  science  courses  as  well  as  those  written  for 
production  worlc  by  professional  programmers.  We  were  also  aware  of  an  effort 
by  researchers  at  Purdue  University  in  which  COBOL  programs  were  being 
analyzed  using  the  software  science  metrics.  We  therefore  desired  to  compare 
our  results  with  those  obtained  by  the  Purdue  group. 

In  deciding  on  our  approach  to  evaluating  principles  of  software  development, 
we  noted  that  several  researchers  already  had  begun  to  deal  with  coding 
principles.  VJe  therefore  decided  to  focus  our  attention  at  the  design  phase 
of  the  software  lifecycle,  where  several  principles  also  exist.  This  approach 
also  has  the  advantage  of  potentially  allowing  quality  decisions  to  be  reached 
earlier  in  the  development  process,  when  corrective  action  is  typically  less 
expensive. 

We  were  interested  in  two  types  of  investigations.  One  concerned  the 
evaluation  of  several  possible  quantitative  metrics  which  could  be  obtained 
from  design  documents,  to  see  which  appeared  to  provide  the  most  information 
about  the  quality  of  the  resulting  software.  The  second  concerned  the 
evaluation  of  specific  design  principles,  such  as  module  coupling,  in  a 
controlled  experimental  setting.  That  is,  we  sought  to  determine  if  software 
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exhibiting  features  of  good  coupling  (such  as  lack  of  global  variable  usage) 
could  be  shown  superior  to  similar  software  wnose  coupling  was  poorer. 

In  attacking  our  final  objective,  that  of  improving  the  means  of  assessing 
understandability ,  we  noted  that  researchers  in  psychology  have  often  used  a 
technique  called  the  "cloze  procedure"  as  an  alternative  to  (multiple  choice 
or  short  answer)  comprehension  questions.  The  cloze  procedure  is  one  which 
involves  filling  in  missing  blanks  in  a  passage  of  text.  The  greater  the 
subjects'  ability  to  fill  in  the  missing  pieces,  the  greater  the  understanding 
of  the  material.  Since  cloze  materials  are  very  easy  to  construct,  and  have 
received  some  credibility  in  other  domains,  we  decided  to  investigate  this 
technique  in  the  software  domain. 

Results 


A  tool  for  obtaining  the  software  science  metrics  from  COBOL  programs  was 
developed,  tested,  and  enhanced  during  the  project  period.  A  technical  report 
describing  the  tool's  design  and  use  was  published  and  provided  to  the  sponsor 
as  part  of  a  previous  progress  report.  A  large  number  of  programs,  from  both 
student  programming  courses  and  production  environments,  were  analyzed.  A 
technical  report  detailing  this  investigation  is  in  preparation,  but  the  major 
conclusions  are  as  follows. 

The  Halstead  length  relationship  held  up  reasonably  well  for  all  classes  of 
programs,  but  the  components  of  the  software  that  were  included  in  the 
operator  and  operand  counts  strongly  influenced  this  result.  For  instance,  in 
smaller  student  programs,  the  length  estimator  was  best  when  Data  Division  was 
included  in  the  counting  strategy.  For  larger  programs,  including  production 
programs,  counting  or  not  counting  Data  Division  didn't  seem  to  make  much 
difference.  Overall,  then,  it  appears  to  be  appropriate  to  include  Data 
Division  in  COBOL  studies  using  software  science  metrics.  In  fact,  this 
suggests  that  declarative  components  of  programs  in  other  languages  might  have 
strongly  influenced  earlier  results  in  software  science,  and  the  data  used  in 
these  early  studies  might  well  deserve  re-examination  along  these  lines  to  see 
whether  the  conculsions  reached  will  change.  If  so,  the  reaction  of  the 
entire  computer  science  community  to  the  software  science  metrics  might  well 
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oe  different. 

The  language  level  estimator  of  software  science  could  not  be  shown  to  be 
constant  in  any  of  our  studies.  Vie  recommend  that  it  is  not  useful  as  a 
metric  at  this  time,  especially  when  applied  to  an  individual  program. 

The  Halstead  effort  metric  seemed  to  work  very  well  for  small  programs,  but 
was  not  very  good  for  larger  programs.  We  feel  that  the  effort  metric  does 
not  appropriately  capture  the  effort  required  to  integrate  program  components. 
Therefore,  small  programs  which  require  little  integration  effort  are 
relatively  unaffected  by  this  weakness,  while  larger  programs  can  be  greatly 
affected . 

Other  metrics,  such  as  IlcCabe's  cyclomatic  number,  Kafura's  information  flow 
metric,  and  Davis'  chunk  metric,  were  also  studies  in  the  "effort"  domain. 
Hone  of  them  provided  consistently  good  results.  We  feel  that  there  are  to 
date  no  really  valid  metrics  of  software  effort.  The  technical  report 
discusses  some  approaches  which  may  overcome  the  weaknesses  of  the  metrics 
studied  in  this  research. 

The  metrics  obtained  by  our  COBOL  analyzer  were  compared  with  those  obtained 
using  a  COBOL  analyzer  developed  at  Purdue  University.  The  Purdue  analyzer 
was  implemented  on  Ohio  State's  Amdahl  computer  and  the  programs  analyzed  by 
the  Ohio  State  COBOL  analyzer  were  run  on  the  Purdue  analyzer.  There  were 
significant  differences  obtained  in  many  of  the  operator  and  operand  counts, 
due  to  differences  in  the  counting  strategies  employed.  The  effects  of  these 
differences  on  the  validity  of  the  Halstead  metrics,  however,  was  not 
significant.  That  is,  when  one  test  program  was  compared  to  another  using  the 
OSU  analyzer,  the  results  were  similar  to  those  obtained  when  the  same  pair  of 
programs  were  compared  using  the  Purdue  analyzer.  It  turned  out  that  the  same 
conclusions  could  be  reached  concerning  the  length  estimator  and  the  language 
level  and  effort  estimators  using  either  analyzer,  for  the  set  of  programs 
studied.  However,  the  fact  that  the  absolute  values  of  the  metrics  changed 
significantly  cautions  against  reliably  using  the  results  obtained  by 
different  analyzers  in  comparisons  of  programs.  This  makes  it  extremely 
difficult  to  compare  results  obtained  by  different  authors  in  the  literature 
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on  software  uetri.cs. 

Our  work  in  the  evaluation  of  design  methodology  principles  began  with  a  study 
of  several  designs  written  according  to  the  methodology  of  Structured  Design 
espoused  by  Yourdon,  et  al,  in  a  production  environment.  Several  measurable 
properties  of  these  designs  were  recorded,  and  their  relationship  to  errors  in 
development  of  the  resulting  system  was  analyzed.  It  turned  out  that  the 
design  features  most  closely  related  to  development  errors  were  those  which 
had  to  do  with  the  notion  of  coupling.  The  details  of  this  study  can  be  found 
in  Doug  Troy's  masters  thesis. 

Next,  we  performed  several  experiments  to  determine  if  the  principles  which 
influence  the  level  of  coupling  can  be  shown  significant  in  a  controlled 
environmental  setting.  Me  selected  the  feature  of  global  variable  usage  for 
study  because  designs  which  differ  along  this  single  dimension  are  very  far 
apart  in  the  coupling  hierarchy  of  Structured  Design.  Furthermore,  materials 
for  a  controlled  experiment  which  differed  only  along  this  dimension  could 
easily  be  developed.  The  experiments  tested  this  coupling  feature  against 
such  attributes  as  understandability  and  modifiability.  However,  x*e  could  not 
produce  any  main  effects  for  coupling  level  in  these  experiments.  These 
results  suggest  that  the  effect  of  using  global  interface  elements,  rather 
than  parameterized  elements,  is  at  best  a  second  order  influence  on  the 
quality  of  the  software  despite  the  prominent  role  which  it  plays  in  the 
coupling  hierarchy  of  Structured  Design. 

Toward  the  end  of  the  project  period,  we  attempted  to  find  other  aspects  of 
the  design  of  software  which  might  play  a  more  significant  role  in  the 
software's  understandability.  We  tested  two  versions  of  a  master  file  update 
program,  one  of  which  was  developed  according  to  a  top-down  design  philosophy, 
and  the  other  of  which  was  developed  using  data  abstraction  design  principles. 
The  two  versions  were  shown  to  differ  in  their  understandability.  In 
analyzing  chese  differences,  it  seems  to  us  that  the  principle  of  module 
cohesion  is  playing  an  important  role.  Future  research  will  attempt  to 
investigate  this  hypothesis  more  carefully. 


Our  investigations  of  the  cloze  procedure  as  a  technique  for  assessing  the 


uncerscandability  of  software  brought  some  interesting  results.  An 
experimenter  using  a  cloze  technique  has  a  great  deal  of  freedom.  For 

example,  it  is  common  to  delete  every  nth  word  from  prose  texts.  The  choice 

of  n  does  not  seem  to  cake  much  difference  (as  long  as  a  minimun  value  of, 
say,  at  least  3  is  selected),  and  n=5  is  common.  However,  when  applying  the 
cloze  procedure  to  the  software  domain,  different  results  were  obtained  when 
n=3  and  n=5  were  used.  This  has  a  great  bearing  on  the  validity  of  the  cloze 
procedure.  We  also  noted  that  the  kinds  of  ’’words"  (things  separated  by 
blanks  or  punctuation)  that  were  deleted  made  a  significant  difference  in  the 
performance  of  the  subjects.  We  attempted  to  classify  the  nature  of  these 
deletions  to  explain  these  variations,  and  were  fairly  successful  in  this 
regard.  The  classification  also  predicted  the  results  of  our  later 
experiments  and  those  of  another  researcher  who  has  studied  the  cloze 

procedure  in  the  software  domain.  The  forthcoming  dissertation  by  Bill  Hall 
will  describe  the  details  of  these  experiments  and  the  classification  scheme. 

The  work  involving  the  cloze  procedure  has  suggested  several  other  kinds  of 
experiments  which  can  (and  probably  should)  be  done  to  determine  the 

generality  of  our  results.  These  include  varying  such  factors  as  programming 
language,  subject  experience  and  problem  domain.  However,  by  far  the  more 
exciting  outcome  of  this  research  has  been  its  potential  for  more  clearly 
identifying  the  characteristics  of  software  that  make  it  difficult  to 
comprehend.  As  more  is  learned  about  classifying  the  "easy"  and  "hard"  parts 
to  complete  in  cloze  tests,  more  useful  code  metrics  (at  least)  may  well 
result,  and  we  may  get  closer  to  our  objective  of  trying  to  quantify  (and 
thereby  possibly  improve)  at  least  one  important  dimension  of  software 
quality . 
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